Motivation: The mapping of RNA-seq reads to their transcripts of origin is afundamental task in transcript expression estimation and differentialexpression scoring. Where ambiguities in mapping exist due to transcriptssharing sequence, e.g. alternative isoforms or alleles, the problem becomes aninstance of non-trivial probabilistic inference. Bayesian inference in such aproblem is intractable and approximate methods must be used such as Markovchain Monte Carlo (MCMC) and Variational Bayes. Standard implementations ofthese methods can be prohibitively slow for large datasets and complex genemodels. Results: We propose an approximate inference scheme based on VariationalBayes applied to an existing model of transcript expression inference fromRNA-seq data. We apply recent advances in Variational Bayes algorithmics toimprove the convergence of the algorithm beyond the standard variationalexpectation-maximisation approach. We apply our algorithm to simulated andbiological datasets, demonstrating that the increase in speed requires only asmall trade-off in accuracy of expression level estimation. Availability: The methods were implemented in R and C++, and are available aspart of the BitSeq project at https://code.google.com/p/bitseq/. The methodswill be made available through the BitSeq Bioconductor package at the nextstable release.
展开▼